A Prolog-based Environment 
for Reasoning about Programming Languages 
(Extended abstract) 



Roberto Bagnara^, Patricia M. Hill^, and Enea ZafFanella^ 

^ Department of Mathematics, University of Parma, Italy 
{bagnara.zaf f anella}(§cs .unipr . it 
^ School of Computing, University of Leeds, UK 
hillOcomp . leeds .ac.uk 

ECLAIR is a Prolog-based prototype system aiming to provide a functionally 
complete environment for the study, development and evaluation of programming 
language analysis and implementation tools. In this paper, we sketch the over- 
all structure of the system, outlining the main methodologies and technologies 
underlying its components. We also discuss the appropriateness of Prolog as the 
implementation language for the system: besides highlighting its strengths, we 
also point out a few potential weaknesses, hinting at possible solutions. 

Motivation for a flexible language resource. Static program analysis aims 
to find bugs, verify the absence of errors in software, and ensure the correctness of 
given optimizations. The standard and most used theoretical framework that lies 
behind static program analysis is abstract interpretation [8,9]. This framework, 
which has been available for more than thirty years, allows us to separate the 
abstract domains, for representing the program's properties of interest, from 
the abstract interpreter that should mimic the execution of the programs on 
these domains. As far as the abstract domains are concerned, there is now a 
good choice of implementations offering a flexible precision/efficiency trade-off. 
For the abstract interpreter, one non-commercial interpreter that has been used 
for automatically verifying up to one million lines of C code is ASTREE [11]; 
however, ASTREE is specially targeted at a particular class of programs and 
program properties, so that widening its scope of application is likely to require 
significant effort [7]. The interpreters provided by ECLAIR system follow the 
methodology described in [2] which handles most of the problematic features of 
mainstream, single-threaded languages; the concrete semantics is based on the 
structured operational semantics extended to allow for infinite computations [10, 
13, 15] and the abstract semantics is based on the work of Schmidt [16-18]. The 
ECLAIR system is being designed to have a high degree of flexibility for the 
following reasons: 

— As there is a constantly evolving and, potentially, huge set of programming 
languages for which analyzers are needed, the prototype interpreter needs 
to target a variety of language features (including, for example, exception 
handling, functions and pointers) and have the capability of being extended 
to incorporate further languages and language features, not yet considered. 
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Thus, the overall environment has to be designed to provide a uniform inter- 
face that delegates the details of the implementation of the run time system 
to independent components. 

— In order to analyze for different types of program properties and error states, 
the system has to be able to connect with a wide variety of abstract domains. 
Note that the system's interface with the abstract domains and their opera- 
tions must be designed so that it can be coupled with the relational domains, 
that is, those that can capture the relationships between different data ob- 
jects, as well as, the more efficient but less precise, non-relational domains. 
This flexibility in choice of domains should also be dynamic since, in order 
to control the precision/efficiency trade-off, the exact domain used may have 
to be changed during the analysis. 

— The aim is to have one system that can be used for teaching, for the de- 
velopment of new technologies and their evaluation and for demonstrating 
their application. For teaching the basics, we require simple robust systems 
that support the core language features such as the assignment, conditional 
and while commands found in all imperative languages. On the other hand, 
for research and development we need a highly modular and extensible sys- 
tem so that we can plug in new technologies without redefining the whole. 
Lastly, for demonstration, we need the system to be highly efficient with a 
well-structured user interface that can verify software that is written in real 
languages such as C and Java. 

Prom CLAIR to ECLAIR. The 'Combined Language and Abstract Interpre- 
tation Resource' (CLAIR, http : //www . cs . unipr . it/ clair/) has been initially 
developed for and used in a teaching context, as a prototype system to help in 

the study and experimentation with various aspects of programming language 
implementation. The CLAIR system consists of the following functional compo- 
nents: 

— A parser module. Lexical and syntactic analyses of the source program are 
performed according to the concrete grammar of the considered target lan- 
guage, leading to the computation of the corresponding abstract syntax tree 
(AST), i.e., a Prolog term. 

— A static semantics module. The AST computed in the previous phase is 
examined so as to compute the static type of each program fragment and 
perform several context-dependent, well-formedness checks. 

— A dynamic semantics module. The concrete behavior of a program is speci- 
fied by adopting the small-step style of the structural operational semantics 
(SOS) approach [15]. The dynamic configurations of a running instance of 
the program (i.e., the states of the transition system) are appropriately en- 
coded in suitable Prolog terms, which are then evaluated according to the 
transition systems' rules, leading to an interpreter for the language. 

The CLAIR system features two simple (though not simplistic) languages: SFL, 
a functional language; and SIL, a Pascal-like imperative language. By program- 
ming in these languages, students are able to see at first hand how the rather 
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dry rules of the structural operational semantics [15] do in fact deliver a working 
system capable of executing non-trivial programs, while providing a theoretical 
framework for proving interesting program properties. These pedagogic experi- 
ences hinted at how the same cleanness and flexibility goals that were driving 
the development of a teaching tool could also be pursued in the much more 
challenging context of research, leading to the development of ECLAIR (Ex- 
tended CLAIR), whose overall aim is the analysis of mainstream programming 
languages. The newly targeted system, whose specification, implementation and 
evaluation is work in progress, has led to both the design of new functional 
modules and the restructuring of existing ones: 

— The parser module has been extended with several other language instances 
including, e.g., the Java bytecode language and an almost complete parser 
for standard C. 

— The static semantics module has been instrumented to save the collected 
type information in a program annotation database, so as to make it readily 
available to later phases. 

— The dynamic semantics module has been refactored so as to distinguish 
the implementation of the concrete semantics rules (now given in the style 
of the big-step semantics, relating programs to final configurations) from 
the so-called concrete memory structure component that provides both the 
concrete memory and some of the control flow management utilities. The 
concrete memory structure component is partially implemented in the C++ 
language so as to greatly improve the efficiency of the generated interpreter, 
which is now able to execute non-toy programs using a reasonable amount 
of system resources. 

— A new abstract semantics module has been designed so as to allow for the 
computation of the abstract semantics of the program. As for the concrete 
dynamic semantics, a distinction is made between the abstract semantics 
rules and the specification of the abstract memory structure component. 
This last component is intended as a generic interface for many of the ab- 
stract domains and operators that have been proposed for static analysis 
applications, often implemented in a foreign language such as C++. The ab- 
stract semantics is obtained by means of a post-fixpoint computation: the 
intermediate results are saved, using the program annotation database, as 
annotations attached to the nodes of the AST; care is taken so as to allow for 
the specification of context sensitive analyses, whereby different annotations 
can be attached to the same program fragment depending on the calling 
context. 

Reflections on using Prolog. Prolog has many benefits for the implementa- 
tion of a system such as ECLAIR, primarily due to the fact that Prolog is based 
on Horn clauses which are the basis of the formal specification of all the main 
components. Observe that, for the parser, we have the added advantage that the 
syntax can be specified by means of the Prolog definite clause grammar rules, 
which — being a notational variant of Horn clauses — can be executed directly 
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[14]. For the other three components (static semantics, dynamic semantics and 
abstract semantics), the formal specification is provided in the form of sequents 
which map directly to Horn clauses; therefore each component can be regarded 
as an executable form of its specification. 

It follows that, for each of the supported languages, the implementation of 
the formal static and concrete semantics allows us to perform a comprehensive 
series of tests so as to evaluate different aspects of its specification and hence 
help us build confidence that it does accurately match the intended semantics; 
note that, in the case of a real language such as C, the test results can be 
compared directly with those obtained using a standard C compiler. Regarding 
the abstract semantics rules, these also are almost directly translated to Prolog 
code. However, in this case, these rules are incomplete in the sense that they are 
parametric on the abstract domain and therefore the code implementing them 
must be interfaced with specialized libraries that support a variety of abstract 
domains. For ECLAIR, we use the Parma Polyhedra Library so that all the nu- 
merical abstractions provided by the PPL such as polyhedra, octagons, intervals 
and grids can be used by the analyzer [3-5]. Note that, as the PPL provides 
an almost complete interface to a number of Prolog systems, the ECLAIR to 
PPL interface can be written entirely in Prolog. We have though had to find 
appropriate strategies to overcome some inherent weaknesses in Prolog. 

— Some components of the tool require great efficiency and are better expressed 
in an imperative style. This is the case for the concrete memory structure 
that implements, among other things, destructive updates. As already ex- 
plained, by partially implementing this structure in the C++ language, we 
have been able to significantly improve the performance of the interpreter. 
Also, for the analyzer, it is becoming clear that coding the abstract memory 
structure in Prolog (using the Prolog interface of the PPL for the numeri- 
cal abstract domains) is cumbersome. We will thus realize that component 
mainly in C++ and interface it to the Prolog analyzer using the generic, 
portable Prolog-C++ interface distributed with the PPL (sec [1,5]). The 
C++ module of the abstract memory structure will in turn call Prolog for all 
the symbolic manipulation tasks. 

— The lack of types in Prolog makes simple type errors hard to detect. One 
possible solution is to use optional type declarations such as those provided 
by the Ciao Prolog system [6]. However, such non-standard annotations, 
could lead to a loss of portability of the ECLAIR system. A possibility we 
are considering is the use of the TCLP prescriptive type system [12], the 
main problem being that it does not currently support SWI-Prolog. 

— Well-known deficiencies of Prolog module systems (first of all their flat, not 
hierarchical nature) make the development of a modular system like ECLAIR 
more difficult than it ought to be. 

Conclusion with ongoing and future work. We have sketched the overall 
structure of ECLAIR, outlining the main methodologies and technologies under- 
lying its components. We also have justified why we chose Prolog as the imple- 
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mentation language. Apart from the continuing development of the concrete and 
abstract interpreters, current work includes investigating several applications for 
the analyzer, particularly: string cleanness, absence of integer overflows, correct- 
ness of array operations, inference of ranking functions and termination analysis. 
Further interesting extensions of the system are foreseen. In particular, the de- 
velopment of compiler modules (including code generation and optimizations): 
as well as the obvious applications in a teaching context, this will allow to evalu- 
ate the usefulness of the information extracted by the abstract semantics module 
also from the standpoint of optimized compilation. 
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